# Using SLURM for Large-Scale Analysis This guide covers deploying HiTMicTools on SLURM clusters for high-throughput processing of large microscopy datasets. ## Overview SLURM (Simple Linux Utility for Resource Management) enables: - **Parallel Processing**: Process hundreds of images simultaneously across multiple nodes - **Job Arrays**: Split large experiments into manageable batches - **Resource Management**: Dedicated GPU/CPU allocation for each task - **Queue System**: Automatic job scheduling and execution - **Scalability**: Handle experiments too large for local workstations ## Workflow Summary 1. Split your dataset into batches using `split-files` 2. Generate a SLURM submission script using `generate-slurm` 3. Customize the script for your cluster configuration 4. Submit the job array to SLURM 5. Monitor progress and collect results ## 1. Prerequisites ### Cluster Access Ensure you have: - SSH access to your SLURM cluster - Conda environment set up on the cluster - HiTMicTools installed in the cluster environment - Model collection files accessible on the cluster (shared filesystem or local copy) ### Environment Setup on Cluster ```bash # SSH to cluster ssh username@your-cluster.domain # Create conda environment (one-time setup) conda create -n hitmictools python=3.9 conda activate hitmictools # Install HiTMicTools pip install git+https://github.com/phisanti/HiTMicTools # Optional: Install btrack for tracking support git clone https://github.com/quantumjot/btrack.git cd btrack && bash build.sh && pip install . && cd .. ``` ### File Organization Set up your project on the cluster: ``` /your/cluster/path/project/ ├── data/ # Input images │ ├── experiment_001.nd2 │ ├── experiment_002.nd2 │ └── ... ├── results/ # Output directory (created automatically) ├── config/ │ └── analysis_config.yml # Your configuration file ├── models/ │ └── model_collection_tracking_20250529.zip ├── temp/ # File blocks (created by split-files) └── SLURM_jobs_report/ # Job logs (created automatically) └── job_name/ ├── jobid_HiTMicTools.out └── jobid_HiTMicTools.err ``` ## 2. Splitting Files into Batches The `split-files` command divides your dataset into manageable chunks for parallel processing. ### Basic Usage ```bash # Split 100 images into 10 batches (10 images each) hitmictools split-files \ --target-folder ./data \ --n-blocks 10 \ --output-dir ./temp ``` This creates files: ``` temp/ ├── file_block_0.txt # Files 1-10 ├── file_block_1.txt # Files 11-20 ├── ... └── file_block_9.txt # Files 91-100 ``` ### Advanced Options ```bash # Split with filtering and full paths hitmictools split-files \ --target-folder /path/to/images \ --n-blocks 20 \ --output-dir ./temp \ --file-pattern "experiment_A.*" \ --file-extension ".nd2" \ --return-full-path ``` **Parameters:** - `--target-folder`: Directory containing files to split (required) - `--n-blocks`: Number of batches to create (required) - `--output-dir`: Where to save block files (default: `./temp`) - `--file-pattern`: Regex pattern to filter files (optional) - `--file-extension`: File extension filter (e.g., `.nd2`, `.tiff`) - `--return-full-path`: Write full paths vs. filenames only (default: True) - `--no-return-full-path`: Write only filenames (requires working dir change in SLURM script) ### Determining Block Count Choose `n-blocks` based on: - **Total files**: More files = more blocks for better parallelization - **Cluster limits**: Check maximum array size (`scontrol show config | grep MaxArraySize`) - **Processing time**: Aim for 1-6 hours per block (optimal for queue priority) - **Memory**: More blocks = more concurrent jobs = more total memory needed **Example calculations:** - 100 files, 30 min each → 10 blocks (5 hours/block) - 500 files, 10 min each → 25 blocks (3.3 hours/block) - 50 files, 2 hours each → 5 blocks (20 hours/block) ## 3. Generating SLURM Scripts The `generate-slurm` command creates a submission script tailored to your cluster. ### Basic Command ```bash hitmictools generate-slurm \ --job-name 'my_analysis' \ --file-blocks \ --n-blocks 10 \ --conda-env 'hitmictools' \ --config-file './config/analysis_config.yml' ``` This creates a script that: - Processes 10 file blocks as an array job - Uses the `hitmictools` conda environment - Runs the analysis defined in `analysis_config.yml` ### All Available Options ```bash hitmictools generate-slurm \ --job-name 'experiment_001' \ --config-file './config/analysis_config.yml' \ --file-blocks \ --n-blocks 10 \ --conda-env 'hitmictools' \ --email 'your.email@domain.com' \ --partition 'rtx4090' \ --qos 'gpu6hours' \ --time '06:00:00' \ --memory '25G' \ --gpu-count 1 \ --cpu-count 4 \ --work-dir '/path/to/project' ``` **Parameters:** | Parameter | Description | Default | |-----------|-------------|---------| | `--job-name` | SLURM job name | Required | | `--config-file` | Path to config YAML | Required | | `--file-blocks` | Enable array job mode | False | | `--n-blocks` | Number of array tasks | 10 | | `--conda-env` | Conda environment name | `img_analysis` | | `--email` | Email for notifications | `your.email@unibas.ch` | | `--partition` | SLURM partition | `rtx4090` | | `--qos` | Quality of service | `gpu6hours` | | `--time` | Max walltime | `06:00:00` | | `--memory` | RAM per CPU | `25G` | | `--gpu-count` | GPUs per task | 1 | | `--cpu-count` | CPUs per task | 4 | | `--work-dir` | Project directory | Current directory | ### Understanding the Generated Script A typical generated script looks like: ```bash #!/bin/bash #SBATCH --job-name=my_analysis #SBATCH --mail-user=your.email@unibas.ch #SBATCH --mail-type=END,FAIL #SBATCH --time=06:00:00 #SBATCH --qos=gpu6hours #SBATCH --mem-per-cpu=25G #SBATCH --partition=rtx4090 #SBATCH --gres=gpu:1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --array=0-9 # 10 tasks (0-9) #SBATCH --output=./SLURM_jobs_report/my_analysis/%A_%a_HiTMicTools.out #SBATCH --error=./SLURM_jobs_report/my_analysis/%A_%a_HiTMicTools.err # Load modules module load Python module load CUDA module load jobstats # Create log directory mkdir -p ./SLURM_jobs_report/my_analysis # Check GPU availability if command -v nvidia-smi &> /dev/null; then echo "GPU information:" nvidia-smi else echo "No GPU available" fi # Check CPU echo "CPU information:" lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)' # Activate conda environment source ~/.bashrc conda init conda activate hitmictools # Change to project directory cd '/path/to/project' # Set variables CONFIG_FILE="./config/analysis_config.yml" BLOCK_NUM=${SLURM_ARRAY_TASK_ID} FILELIST="./temp/file_block_${BLOCK_NUM}.txt" # Display configuration echo "Config file contents:" cat "$CONFIG_FILE" # Run analysis echo "Executing command:" echo "hitmictools run --config $CONFIG_FILE --worklist $FILELIST" hitmictools run --config $CONFIG_FILE --worklist $FILELIST # Display resource usage sstat --format=JobID,AveCPU,AveRSS,MaxRSS -j $SLURM_JOBID.batch sacct -o JobID,CPUTime -j $SLURM_JOBID ``` ## 4. Customizing SLURM Scripts ### Adjusting Resource Requests #### Time Limits Match `--time` to your queue's QoS: ```bash # For short jobs (< 6 hours) --qos gpu6hours --time 05:00:00 # For long jobs (< 24 hours) --qos gpu24hours --time 20:00:00 # For very long jobs --qos gpu1week --time 7-00:00:00 # 7 days ``` #### Memory Requirements Estimate memory needs: - **Basic analysis**: 20-25G per CPU - **With tracking**: 30-35G per CPU - **Large images (>4K x 4K)**: 40-50G per CPU ```bash # Low memory (small images, no tracking) --memory 20G --cpus-per-task 4 # Total: 80GB # High memory (large images, tracking) --memory 40G --cpus-per-task 4 # Total: 160GB ``` #### GPU Selection Specify GPU type and count: ```bash # Single RTX 4090 (most common) --partition rtx4090 --gres gpu:1 # Single A100 (for very large models) --partition a100 --gres gpu:1 # Multiple GPUs (advanced, requires code modification) --gres gpu:2 ``` ### Understanding Partitions and QoS (Quality of Service) **For wet lab biologists new to computing clusters:** Think of the SLURM cluster as a shared resource pool, similar to booking time on a shared microscope facility. **Partitions** are like different types of equipment (e.g., confocal microscope vs. widefield), each with specific capabilities. **QoS (Quality of Service)** is like booking time slots - you can reserve shorter time slots (30 minutes to 6 hours) which are processed faster, or longer slots (1 day to 2 weeks) for extensive analyses. The key difference: shorter time slots get higher priority in the queue, similar to how "quick scans" might be prioritized over "overnight acquisitions" on a microscope booking system. Choose the shortest time that fits your analysis to get results faster. #### Available Partitions (Computing Resources) Each partition provides different hardware optimized for specific tasks: | Partition | Hardware | GPUs | Memory | Best For | |-----------|----------|------|---------|----------| | `rtx4090` | Latest NVIDIA RTX 4090 GPUs | 8 per node | 1 TB | **HiTMicTools standard** - Fast, modern GPUs ideal for image analysis | | `a100` | NVIDIA A100 GPUs (40 GB) | 4 per node | 1 TB | Very large models or high-memory tasks | | `a100-80g` | NVIDIA A100 GPUs (80 GB) | 4 per node | 1 TB | Extremely large models (rarely needed for HiTMicTools) | | `titan` | Older NVIDIA Titan GPUs | 7 per node | 512 GB | Legacy partition (avoid if rtx4090 available) | | `scicore` | CPU-only | None | 512 GB - 1 TB | CPU-only processing (slower) | | `bigmem` | CPU-only, high memory | None | 1-2 TB | Very large memory needs without GPU | **Recommendation for HiTMicTools**: Use `rtx4090` for almost all analyses - it provides the best performance-to-availability ratio. #### Available QoS (Time Limits) QoS determines how long your job can run. Choose based on your expected processing time: | QoS | Maximum Runtime | When to Use | Example Use Case | |-----|-----------------|-------------|------------------| | `gpu30min` | 30 minutes | Quick tests, small datasets | Testing config on 1-2 images | | `gpu6hours` | 6 hours | **Most common** | Standard batch (10-50 movies) | | `gpu1day` | 24 hours | Large batches | Processing 100-200 movies | | `gpu1week` | 7 days | Very large experiments | Processing 500+ movies or tracking | | `30min` | 30 minutes | CPU-only quick tasks | Testing without GPU | | `6hours` | 6 hours | CPU-only standard | Standard analysis without GPU | | `1day` | 24 hours | CPU-only long tasks | Large CPU-only analysis | | `1week` | 7 days | CPU-only very long | Rarely needed | | `2weeks` | 14 days | CPU-only extended | Very rarely needed | #### Resource Limits by QoS The cluster enforces limits to ensure fair sharing among all users. These limits control how many resources (CPUs, GPUs, memory) you can use simultaneously across all your jobs. **GPU QoS Limits:** | QoS | Max Runtime | Total Cluster Limit | Per Account Limit | Per User Limit | |-----|-------------|---------------------|-------------------|----------------| | `gpu30min` | 30 minutes | 3,300 CPUs, 170 GPUs, 26 TB | 2,400 CPUs, 136 GPUs, 22 TB | 2,600 CPUs, 136 GPUs, 22 TB | | `gpu6hours` | 6 hours | 3,000 CPUs, 150 GPUs, 24 TB | 2,000 CPUs, 100 GPUs, 16 TB | 2,000 CPUs, 100 GPUs, 16 TB | | `gpu1day` | 24 hours | 2,500 CPUs, 120 GPUs, 20 TB | 1,250 CPUs, 60 GPUs, 10 TB | 1,250 CPUs, 60 GPUs, 10 TB | | `gpu1week` | 7 days | 1,500 CPUs, 48 GPUs, 12 TB | 750 CPUs, 24 GPUs, 6 TB | 750 CPUs, 24 GPUs, 6 TB | **CPU-only QoS Limits:** | QoS | Max Runtime | Total Cluster Limit | Per Account Limit | Per User Limit | |-----|-------------|---------------------|-------------------|----------------| | `30min` | 30 minutes | 12,000 CPUs, 68 TB | 10,000 CPUs, 50 TB | 10,000 CPUs, 50 TB | | `6hours` | 6 hours | 11,500 CPUs, 64 TB | 7,500 CPUs, 40 TB | 7,500 CPUs, 40 TB | | `1day` | 24 hours | 9,000 CPUs, 60 TB | 4,500 CPUs, 30 TB | 4,500 CPUs, 30 TB | | `1week` | 7 days | 3,800 CPUs, 30 TB | 2,000 CPUs, 15 TB | 2,000 CPUs, 15 TB | | `2weeks` | 14 days | 1,300 CPUs, 10 TB | 128 CPUs, 2 TB | 128 CPUs, 2 TB | **What this means in practice:** - If you submit 10 jobs with `gpu6hours` QoS, each requesting 1 GPU and 4 CPUs, you'll use 10 GPUs and 40 CPUs total - This is well within the per-user limit of 100 GPUs and 2,000 CPUs for `gpu6hours` - Shorter QoS options allow more concurrent jobs but less time per job - Longer QoS options allow fewer concurrent jobs but more time per job #### Recommended Configurations for HiTMicTools **Standard analysis (50-100 images):** ```bash --partition rtx4090 --qos gpu6hours --time 05:00:00 ``` **Large tracking experiment (200+ images):** ```bash --partition rtx4090 --qos gpu1day --time 20:00:00 ``` **Testing configuration (1-5 images):** ```bash --partition rtx4090 --qos gpu30min --time 00:25:00 ``` **Very large dataset (500+ images):** ```bash --partition rtx4090 --qos gpu1week --time 4-00:00:00 # 4 days ``` ### Modifying Array Size Change the array range in the script header: ```bash # For 20 blocks (0-19) #SBATCH --array=0-19 # For 50 blocks (0-49) #SBATCH --array=0-49 # Process subset of blocks (e.g., only blocks 10-20) #SBATCH --array=10-20 ``` ### Email Notifications Control when you receive emails: ```bash # All events #SBATCH --mail-type=ALL # Only failures #SBATCH --mail-type=FAIL # Start and end #SBATCH --mail-type=BEGIN,END # Disable emails # Remove or comment out --mail-user and --mail-type lines ``` ## 5. Submitting and Managing Jobs ### Submitting Jobs ```bash # Submit the job array sbatch run_analysis.sh # Submit with dependency (wait for job 12345 to complete) sbatch --dependency=afterok:12345 run_analysis.sh # Submit with limited array size (max 10 concurrent) sbatch --array=0-99%10 run_analysis.sh ``` Expected output: ``` Submitted batch job 67890 ``` ### Monitoring Jobs ```bash # Check your jobs in the queue squeue -u $USER # Detailed view of a specific job scontrol show job 67890 # Check array job status squeue -u $USER -t RUNNING,PENDING # View job progress tail -f SLURM_jobs_report/my_analysis/67890_0_HiTMicTools.out ``` ### Queue Status Output ``` JOBID PARTITION NAME USER ST TIME NODES NODELIST 67890_0 rtx4090 my_analysis user R 1:23 1 node042 67890_1 rtx4090 my_analysis user R 1:22 1 node043 67890_2 rtx4090 my_analysis user PD 0:00 1 (Resources) ``` **Status codes:** - `R` - Running - `PD` - Pending (waiting for resources) - `CG` - Completing (job finishing) - `CD` - Completed - `F` - Failed ### Canceling Jobs ```bash # Cancel a specific array task scancel 67890_5 # Cancel entire job array scancel 67890 # Cancel all your jobs scancel -u $USER # Cancel pending jobs only scancel -u $USER -t PENDING ``` ### Checking Resource Usage ```bash # After job completes, view statistics sacct -j 67890 --format=JobID,JobName,Partition,Elapsed,State,MaxRSS,MaxVMSize # For array jobs, see all tasks sacct -j 67890 --format=JobID,State,MaxRSS,Elapsed # Detailed efficiency report seff 67890_0 ``` ## 6. Configuration for SLURM ### Adapting Your Config File Your config file should use **absolute paths** on the cluster: ```yaml input_data: input_folder: "/cluster/path/to/data" output_folder: "/cluster/path/to/results" file_type: ".nd2" export_labelled_masks: false export_aligned_image: false pipeline_setup: name: "ASCT_focusrestore" parallel_processing: false # Use SLURM arrays instead num_workers: 1 # Single worker per SLURM task reference_channel: 0 pi_channel: 1 focus_correction: true align_frames: true method: "basicpy_fl" tracking: true models: model_collection: "/cluster/path/to/models/model_collection_tracking_20250529.zip" tracking: parameters_override: null ``` **Important notes:** - Set `parallel_processing: false` (SLURM handles parallelism) - Set `num_workers: 1` (each SLURM task processes one block) - Use absolute paths for portability - Keep `export_labelled_masks: false` to save disk space ### Testing Configuration Before submitting large jobs, test with a single file: ```bash # Create a test worklist with one file echo "test_image.nd2" > test_worklist.txt # Run locally on a cluster node (interactive session) srun --partition=rtx4090 --gres=gpu:1 --mem=25G --cpus-per-task=4 --pty bash conda activate hitmictools hitmictools run --config config/analysis_config.yml --worklist test_worklist.txt exit ``` ## 7. Best Practices ### Resource Optimization 1. **Right-size your requests:** - Don't request more memory/time than needed (wastes resources) - Request slightly more than expected (avoid job failures) 2. **Optimize block size:** - Aim for 2-6 hour jobs (good queue priority) - Avoid jobs < 30 min (overhead) or > 24 hours (risky) 3. **Use appropriate QoS:** - Short jobs → `gpu6hours` - Standard jobs → `gpu24hours` - Long jobs → `gpu1week` (but minimize use) ### Data Management 1. **Store data efficiently:** - Keep raw data in shared filesystem - Write results to high-speed scratch space if available - Move final results to permanent storage after completion 2. **Cleanup strategy:** ```bash # Remove temporary file blocks after completion rm -rf temp/ # Archive logs tar -czf logs_${SLURM_JOB_ID}.tar.gz SLURM_jobs_report/my_analysis/ ``` 3. **Monitor disk usage:** ```bash # Check quota quota -s # Find large files du -sh results/*/ | sort -h ``` ### Error Handling 1. **Check logs regularly:** ```bash # Find failed jobs grep -l "Error" SLURM_jobs_report/my_analysis/*.err # Count successful completions ls results/*.csv | wc -l ``` 2. **Rerun failed tasks:** ```bash # If task 5 failed, rerun just that block sbatch --array=5 run_analysis.sh # Rerun multiple failed tasks sbatch --array=3,5,7,12 run_analysis.sh ``` 3. **Common failure causes:** - Out of memory → Increase `--memory` - Timeout → Increase `--time` or reduce block size - Missing files → Check paths in config - CUDA errors → Check `--gres=gpu:1` is set ## 8. Example Workflows ### Workflow 1: Standard Analysis (100 files) ```bash # 1. Split files hitmictools split-files --target-folder ./data --n-blocks 10 --output-dir ./temp # 2. Generate script hitmictools generate-slurm \ --job-name 'exp001' \ --file-blocks \ --n-blocks 10 \ --conda-env 'hitmictools' \ --config-file './config/analysis.yml' # 3. Submit sbatch slurm_script.sh # 4. Monitor squeue -u $USER watch -n 30 'ls results/*.csv | wc -l' # Count completed files # 5. Check completion ls results/*.csv | wc -l # Should be 100 ``` ### Workflow 2: Large-Scale Tracking (500 files) ```bash # 1. Split into 25 blocks hitmictools split-files --target-folder ./data --n-blocks 25 --output-dir ./temp # 2. Generate script with more resources hitmictools generate-slurm \ --job-name 'tracking_exp' \ --file-blocks \ --n-blocks 25 \ --conda-env 'hitmictools' \ --config-file './config/tracking_config.yml' \ --memory '35G' \ --time '08:00:00' \ --qos 'gpu24hours' # 3. Submit with max 5 concurrent jobs sbatch --array=0-24%5 slurm_script.sh # 4. Monitor progress watch -n 60 'squeue -u $USER; echo ""; ls results/*.csv | wc -l' ``` ### Workflow 3: Reprocessing Subset ```bash # Reprocess only specific files cat > reprocess_list.txt << EOF data/image_042.nd2 data/image_103.nd2 data/image_205.nd2 EOF # Run without array job hitmictools run --config config/analysis.yml --worklist reprocess_list.txt ``` ## 9. Troubleshooting ### Job Pending Forever ```bash # Check why job is pending scontrol show job 67890 | grep Reason # Common reasons: # - Resources: No available nodes with requested resources # - Priority: Other jobs have higher priority # - QOSMaxCpuPerUserLimit: You've exceeded CPU limit ``` **Solutions:** - Reduce resource requests - Choose different partition - Wait for resources to free up - Cancel competing jobs if appropriate ### Out of Memory Errors Check logs: ```bash grep -i "memory" SLURM_jobs_report/my_analysis/*.err grep -i "killed" SLURM_jobs_report/my_analysis/*.err ``` **Solutions:** - Increase `--memory` in SLURM script - Reduce `num_workers` in config - Set `export_labelled_masks: false` - Process smaller file blocks ### GPU Not Available ```bash # Check SLURM script has GPU request grep "gres=gpu" slurm_script.sh # Verify GPU in job srun --jobid=67890_0 nvidia-smi ``` **Solutions:** - Add `#SBATCH --gres=gpu:1` to script - Verify partition supports GPUs - Check QoS allows GPU access ### Files Not Found ```bash # Check file paths in error logs grep "FileNotFoundError" SLURM_jobs_report/my_analysis/*.err # Verify paths are accessible from compute nodes srun --partition=rtx4090 --pty bash ls /cluster/path/to/data exit ``` **Solutions:** - Use absolute paths in config - Ensure shared filesystem is mounted - Check file permissions ## 10. Advanced Topics ### Dependency Chains Run multiple analyses in sequence: ```bash # Job 1: Preprocess JOB1=$(sbatch --parsable preprocess.sh) # Job 2: Analysis (waits for Job 1) JOB2=$(sbatch --parsable --dependency=afterok:$JOB1 analysis.sh) # Job 3: Postprocess (waits for Job 2) sbatch --dependency=afterok:$JOB2 postprocess.sh ``` ### Checkpoint and Resume For very long analyses: ```yaml # In your config, process files one at a time input_data: file_list: - "long_timeseries_001.nd2" ``` Then use multiple shorter SLURM jobs instead of one long job. ### Custom SLURM Directives Add additional directives to the generated script: ```bash # After generation, edit the script to add: #SBATCH --exclusive # Exclusive node access #SBATCH --constraint=gpu40 # Specific GPU type #SBATCH --account=proj_name # Billing account ``` ## Summary SLURM workflow with HiTMicTools: 1. Use `split-files` to create file blocks 2. Use `generate-slurm` to create submission script 3. Customize script for cluster resources 4. Submit with `sbatch` 5. Monitor with `squeue` and log files 6. Handle failures by rerunning specific array indices For help with your specific cluster, consult: - Your cluster's documentation - `man sbatch` and `man squeue` - Cluster support team